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We present the first evidence from a hadron collider of WW + W Z production with semi- 
leptonic decays. The data were recorded by the DO detector at the Fermilab Tevatron and 
correspond to L07 fb _1 of integrated luminosity obtained in proton-antiproton collisions at 
yfs =1.96 TeV. The cross section observed for WW + WZ production is 20.2 ± 4.5 pb with a 
significance of 4.4 standard deviations. 



Introduction 



There are many reasons for studying WW/WZ — ► tvqq at the Tevatron. From the electroweak 
prospective, diboson production provides a probe of self- interactions of vector bosons. Deviations 
from the Standard Model (SM) of these trilinear gauge boson coupling would affect the cross 
sections and event kinematics of diboson production^. The cross sections for diboson pro duction 
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at the Tevatron had previously only been measured for the fully leptonic final states 1 ^ 21 , so this 
analysis provides a compliment to the previous measurements. 

Reconstruction of WW and WZ events in semi-leptonic final states represents a challenge 
in separating signal from the dominant background of a W boson produced in association with 
jets. This is a challenge shared by many Higgs boson searches, e.g. WH — ► tvbb, making this 
measurement a benchmark for these similar Higgs boson searches. Furthermore, this analysis 
provides a proving ground for the multivariate event-classification schemes and the accompanying 
statistical techniques^ that are used for the Tevatron Higgs boson searches in the entire mass 
range allowed by the SM. 



2 Event Selection 



To select candidate events for pp — > WW/WZ — > we required a single reconstructed lepton 
(electron or muon)^ with transverse momentum > 20 GeV and \rj\ < 1.1 (for electrons) or 
\rj\ < 2 (for muons), an imbalance in transverse momentum Iftr> 20 GeV, and at least two jets^ 
with pt > 20 GeV and \rj\ < 2.5. The leading jet (i.e. with the highest pr) was also required 
to have pr > 30 GeV. To reduce background from processes that do not contain W — > £u, we 
required a transverse W mass of Mjf > 35 GeV, where M T = y/{E T ) 2 - (pr) 2 ®. The electron 
or muon trajectories were required to be isolated from other objects in the calorimeter, and had 
to match a track reconstructed in the central tracking system that originated from the primary 
vertex. Also, the muon had to be reconstructed as an isolated track in the central tracking 
system. The resulting kinematic distributions are shown in Fig. [TJ 
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Figure 1: Kinematic distributions after all selection requirement: (a) pr of lepton; (b) I£t; (c) transverse W mass; 
(d) pt of leading jet; (e) pt of second-leading jet; (f) dijet mass. 



3 Data Sample 

The data were collected with the DO detector^ at the Fermilab Tevatron Collider at a center- 
of-mass energy of \fs = 1.96 TeV. The events studied in this analysis correspond to 1.07 fb _1 
of integrated luminosity collected during Run Ha (2002-2006). To be considered for analysis, 
events in the evqq channel were required to pass at least one single electron or electron+jet(s) 
trigger. The resulting trigger efficiency was 98^g%. A suite of triggers was used for the ixvqq 
channel resulting in a trigger efficiency of nearly 100%. 

4 Signal and Background Estimations 

Monte Carlo generators were used to simulate the signal and background samples that contained 
a charged lepton in the final state. Signal events were generated with PYTHIA^I using CTEQ6L 
parton distribution functions (PDF). Alpgen^SI w ith CTEQ6L1 PDFs was used to generate 
VF+jets, Z+jets, and ti events and COMPHEpQJ] w ith CTEQ6L1 PDFs was used to simulate 
single-top events. All alpgen and COMPHEP events used PYTHIA for parton showering and 



Table 1: Measured number of events for signal and each background after the combined fit of the RF distribution 
(with total uncertainties determined from the fit) and the observed number of selected events. 





punn channel 


nunc! rhannpl 


Diboson signal 


436 ± 36 


527 ± 43 


W+jets 


10100 ± 500 


11910 ± 590 


Z+jets 


387 ± 61 


1180 ± 180 


tt + single top 


436 ± 57 


426 ± 54 


Multijet 


1100 ± 200 


328 ± 83 


Total predicted 


12460 ± 550 


14370 ± 620 


Data 


12473 


14392 



hadronization. After generation, the events underwent a GEANT-based^^ detector simulation 
before being reconstructed with the same programs as the data. 

With the exception of VF+jets, all background MC samples were normalized to next-to- 
leading-order (NLO) or next-to-next-to-leading-order SM predictions. The VF+jets normaliza- 
tion was determined simultaneously with the signal cross section by a fit to data, as discussed 
later. 

The probability for a multijet event to mimic a lepton and pass all selection cuts was quite 
small; however, because the cross section for multijet production is so large, the background 
from multijet events had to be accounted for. For the \ivqq channel, the multijet background 
was modeled with "anti-isolated" data corresponding to events that failed the muon isolation 
requirements, but passed all other selections. The kinematic distributions of the anti-isolated 
data were corrected for contributions from processes already modeled via MC. The normalization 
of the multijet background in the muon channel was determined from a fit to the transverse W 
mass distribution of the /x + v system. 

For the evqq channel, the multijet background was estimated using a "loose-but-not-tight" 
(LNT) data sample obtained by selecting events that passed a loosened electron-quality require- 
ment, but did not pass the electron-quality requirement of the final selection^. To estimate the 
correct rate for multijet events, a weight was applied to each LNT event based on the probability 
for a jet to mimic an electron. Also, the contribution from events that were already modeling 
via MC was subtracted. 



5 MC Corrections and Systematic Uncertainties 

As one can see from Table [IJ contributions to the selected events was dominated by the back- 
ground from VF+jets. Therefore, accurate modeling of the VF+jets background was of particular 
importance. We performed detailed studies of the alpgen ly+jets MC sample and associated 
sources of uncertainty. Comparison with other generators and data showed discrepancies be- 
tween the modeling of jet r] and AR between jets^H Therefore, the data were used to correct 
these quantities in the alpgen VF+jets and Z+jets samples. The effect of the diboson sig- 
nal on the derived corrections was small, but nonetheless taken into account via a systematic 
uncertainty assigned to the procedure. The alpgen VF+jets sample was also assigned system- 
atic uncertainties for variations of the renormalization (and factorization) scale and jet-parton 
matching parameters^! PDF uncertainties were evaluated for all of the MC samples, as were 
uncertainties from object reconstruction and identification. A full list of the systematic uncer- 
tainties and the magnitude of each is given in Table [2j We considered systematic uncertainties 
that affected both normalization and the shapes of kinematic distributions. 



Table 2: The % systematic uncertainties for Monte Carlo simulations and multijet estimates. Uncertainties are 
identical for both lepton channels except where indicated otherwise. The nature of the uncertainty, i.e., whether 
it had a differential dependence (D) or just normalization (N), is also provided. The values for uncertainties with 
a differential dependence correspond to the RMS amplitudes in the RF output distribution. Also provided is the 
contribution of each source to the total systematic uncertainty of 3.6 pb on the measured cross section. 



Source of systematic 
uncertainty 


Diboson 


VK+jets 


Z±jcts 


Top 


Multijet 


Act (pb) 


Trigger efficiency, cuqq channel 


+21 - 


3 


+2/ - 3 


+2/ - 3 


+2/ - 3 




< 0.1 


Trigger efficiency, [ivqq channel 


+o/- 


5 


+0/ — 5 


±0/ — 5 


±0/ — 5 




< 0.1 


Lepton identification 


±4 




±4 


±4 


±4 




< 0.1 


Jet identification 


±1 




±1 


±1 


± <1 




0.3 


Jet energy scale 


±4 




±9 


±9 


±4 




1.9 


Jet energy resolution 


±3 




±4 


±4 


±4 




< 0.1 


Cross section 






±20 


±6 


±10 




1.1 


Multijet normalization, euqq channel 












±20 


0.9 


Multijet normalization, [wqq channel 












±30 


0.5 


Multijet shape, evqq channel 












±6 


< 0.1 


Multijet shape, [ivqq channel 












±10 


< 0.1 


Diboson signal NLO/LO shape 


±10 












< 0.1 


Parton distribution function 


±1 




±1 


±1 


±1 




0.2 


ALPGEN rj and AR corrections 






±1 


±1 






< 0.1 


Renormalization and factorization scale 






±3 


±3 






0.9 


ALPGEN parton-jet matching parameters 






±4 


±4 






2.4 



6 Multivariate Classification 

Improved separation between the signal and the backgrounds was achieved using a multivariate 
classification technique to combine info rmatio n from several kinematic variables. The technique 
used was a random forest (RF) 
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from the StatPatternRecognitionUSI software 
package. The RF algorithm creates many decision tree classifiers, which are basically a series of 
optimized binary splits to separate signal from background. The RF is then formed by taking 
the average of all of the decision trees. The key to the RF is that each decision tree uses only 
a subset of the input variables (selected randomly for each tree) and is trained on a bootstrap 
replica of the full training set. This results in each of the trees generalizing differently to 
unseen data because each tree was trained with differently. The net effect of then averaging all 
the trees is an accurate and stable classifier. 

The inputs to the RF were thirteen well-modeled kinematic variables that demonstrated a 
difference in probability density between signal and at least one of the backgrounds. A RF for 
each channel was trained using one half of each MC sample. The other halves, along with the 
multijet background samples, were used to evaluate the RF output distributions for comparison 
to the data. These RF output distributions were then used to measure the excess of events in 
the data consistent with the kinematics of WW and WZ production (over that expected from 
multijet and other SM processes). 



7 Cross Section Measurement 

The cross section for WW + WZ production was determined from a fit of signal and background 
RF templates to the data by minimizing a Poisson x 2 function within variations of the systematic 
uncertainties^. The systematic uncertainties were treated as Gaussian-distributed uncertainties 
on the expected numbers of signal and background events in each bin of the RF distribution. 
Each individual uncertainty was treated as 100% correlated between channels, samples, and 
from bin to bin. Different sources of uncertainty were assumed to be independent. 

The normalizations of the RF templates for the signal and the W / ±jets background were 



unconstrained in the fit; allowing the fit to simultaneously measured the signal cross section and 
determine the normalization of the dominant background. This approach eliminated the need 
to use the W+jets cross section predicted by alpgen and provided an unbiased uncertainty 
for the normalization of the dominant background. As a check of the procedure, the fit yielded 
an effective k-factor of 1.53 db 0.13 that needed to be applied to the alpgen cross section to 
best match the data, which is close to what one would expected from the ratio of NLO to LO 
predictions for the VF+jets cross section. 

Table [3] contains the results of the fit in the evqq, [ivqq, and the combined channels. The 
combined distribution of the RF output after the combined fit and the same plot with the 
background subtracted are shown in Fig. [2j Also in Fig. [2] is the background-subtracted plot 
for the dijet mass distributions showing the resonant dijet signal peak observed in data. The 
common behavior of each fit indicates a WW + WZ cross section consistent with, though 
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somewhat larger than, the expected SM value of a(WW + WZ) = 16.1 pb 1 ^. The combined 
lepton channel cross section fit yielded a total value of 20.2 ± 2.5(stat) ± 3.6(sys) ± 1.2(lum) pb, 
which is slightly less that one standard deviation from expectation. 

Table [3] also provides the result from preforming the measurement using only the dijet mass 
distribution. As expected, the measurement from the dijet mass distribution was less precise 
than from the RF because the RF was better at discriminating signal from background. 
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Figure 2: The distributions after cross section fit of the RF distribution: (a) RF output; (b) RF output with 
background subtracted; (c) dijet mass with background subtracted. 



Table 3: The signal cross section determined from a simultaneous fit to the data of the WW + WZ cross section 

and the normalization factor for W^+jets. 



Channel 


Fitted signal a (pb) 


evqq RF Output 
livqq RF Output 
Combined RF Output 


18.0±3.7(stat)±5.2(sys)±l.l(lum) 
22.8±3.3(stat)±4.9(sys)±1.4(lum) 
20.2±2.5(stat)±3.6(sys)±1.2(lum) 


Combined Dijet Mass 


18.5±2.8(stat)±4.9(sys)±l.l(lum) 



8 Significance 

Arguably just as important as the cross sections measurement is the significance of the mea- 
surement. The expected and observed significances were obtained via fits of the signal plus 
background hypothesis to MC events drawn from the background-only hypothesis Lt21 . The 
pseudo-data samples were generated from random Poisson trials seeded by the predicted num- 
ber of background events smeared within the systematic uncertainties. A measurement of the 
signal cross section was performed on each of the background-only pseudo-data distributions just 
as for the data. The expected significance corresponds to the fraction of outcomes that yielded 



a cross section at least as large as the SM prediction for WW + WZ production. The observed 
significance was determined by the fraction of outcomes above the measured cross section. 

Table H] gives the probability (p- value) and Gaussian significance (number of standard de- 
viations for the corresponding Gaussian confidence level) for expected and observed outcomes 
corresponding to the measurements in Table Again one can see the merit of the multivariate 
classifier. While the observed significance using the dijet mass was found to be 3.3 standard 
deviation, the RF had an observed significance of 4.4 standard deviations. 



Table 4: Expected and observed p-values obtained by comparing the measurement with background-only pseudo- 
experiments and the corresponding significance in number of standard deviations (s.d.) for a one-sided Gaussian 

integral. 



Channel 


Expected p-value (si; 


mificance) 


Observed p-value (sij 


mificance) 


evqq RF Output 


6.8 x 10" a (2.5 


s.d.) 


3.2 x 10" 3 (2.7 


s.d.) 


livqq RF Output 


1.8 x 1(T 3 (2.9 


s.d.) 


5.2 x 1(T 5 (3.9 


s.d.) 


Combined RF Output 


1.5 x 1(T 4 (3.6 


s.d.) 


5.4 x 10" 6 (4.4 


s.d.) 


Combined Dijet Mass 


1.7 x 1(T 3 (2.9 


s.d.) 


4.4 x 10~ 4 (3.3 


s.d.) 



9 Conclusions 

Using semi-leptonic decay channels, we measured a{WW + WZ) = 20.2 ± 4.5 pb in proton- 
antiproton collisions \fs = 1.96 TeV. This is consistent with the SM prediction of a(WW + 
WZ) = 16.1 ± 0.9 pb as well as with previous measurements of WW and WZ in the fully 
leptonic final states^El The significance of the measurement is 4.4 standard deviations about 
the background, indicating the first direct evidence for WW+WZ production with semi-leptonic 
decays at a hadron collider. Finally this analysis demonstrates the ability to measure a small 
signal in a large background for a final state of direct relevance to searches for a low mass Higgs 
boson and provides a validation of the analytical methods used in searches for Higgs bosons at 
the Tevatron^l. 
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